layer type
- North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
- North America > Puerto Rico > San Juan > San Juan (0.04)
- North America > Canada > Quebec > Montreal (0.04)
- Europe > Switzerland > Zürich > Zürich (0.04)
- North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
- North America > Puerto Rico > San Juan > San Juan (0.04)
- North America > Canada > Quebec > Montreal (0.04)
- Europe > Switzerland > Zürich > Zürich (0.04)
When Can You Get Away with Low Memory Adam?
Kalra, Dayal Singh, Kirchenbauer, John, Barkeshli, Maissam, Goldstein, Tom
Adam is the go-to optimizer for training modern machine learning models, but it requires additional memory to maintain the moving averages of the gradients and their squares. While various low-memory optimizers have been proposed that sometimes match the performance of Adam, their lack of reliability has left Adam as the default choice. In this work, we apply a simple layer-wise Signal-to-Noise Ratio (SNR) analysis to quantify when second-moment tensors can be effectively replaced by their means across different dimensions. Our SNR analysis reveals how architecture, training hyperparameters, and dataset properties impact compressibility along Adam's trajectory, naturally leading to $\textit{SlimAdam}$, a memory-efficient Adam variant. $\textit{SlimAdam}$ compresses the second moments along dimensions with high SNR when feasible, and leaves when compression would be detrimental. Through experiments across a diverse set of architectures and training scenarios, we show that $\textit{SlimAdam}$ matches Adam's performance and stability while saving up to $98\%$ of total second moments. Code for $\textit{SlimAdam}$ is available at https://github.com/dayal-kalra/low-memory-adam.
- North America > United States > Maryland (0.04)
- North America > Canada > Ontario > Toronto (0.04)
- Europe > Germany > Berlin (0.04)
- (2 more...)
Reviews: Wide Feedforward or Recurrent Neural Networks of Any Architecture are Gaussian Processes
This review has 2 parts. The first part is my review of the paper as a standalone paper. The second part is a meta-commentary unifying my reviews for both this paper and "Neural Tangent Kernel for Any Architecture". Part 1 This paper demonstrates that infinitely-wide architectures made from a range of building blocks are Gaussian processes. Fundamentally, the paper seems to have two core contributions. This paper is a clean, elegant and logical next step in an important research direction.
asanAI: In-Browser, No-Code, Offline-First Machine Learning Toolkit
Koch, Norman, Ghiasvand, Siavash
Machine learning (ML) has become crucial in modern life, with growing interest from researchers and the public. Despite its potential, a significant entry barrier prevents widespread adoption, making it challenging for non-experts to understand and implement ML techniques. The increasing desire to leverage ML is counterbalanced by its technical complexity, creating a gap between potential and practical application. This work introduces asanAI, an offline-first, open-source, no-code machine learning toolkit designed for users of all skill levels. It allows individuals to design, debug, train, and test ML models directly in a web browser, eliminating the need for software installations and coding. The toolkit runs on any device with a modern web browser, including smartphones, and ensures user privacy through local computations while utilizing WebGL for enhanced GPU performance. Users can quickly experiment with neural networks and train custom models using various data sources, supported by intuitive visualizations of network structures and data flows. asanAI simplifies the teaching of ML concepts in educational settings and is released under an open-source MIT license, encouraging modifications. It also supports exporting models in industry-ready formats, empowering a diverse range of users to effectively learn and apply machine learning in their projects. The proposed toolkit is successfully utilized by researchers of ScaDS.AI to swiftly draft and test machine learning ideas, by trainers to effectively educate enthusiasts, and by teachers to introduce contemporary ML topics in classrooms with minimal effort and high clarity.
- Europe > Germany > Saxony > Leipzig (0.04)
- North America > United States > Georgia > Fulton County > Atlanta (0.04)
- Europe > Germany > Saxony > Dresden (0.04)
- Research Report (0.40)
- Instructional Material (0.34)
PreNeT: Leveraging Computational Features to Predict Deep Neural Network Training Time
Pourali, Alireza, Boukani, Arian, Khazaei, Hamzeh
Training deep learning models, particularly Transformer-based architectures such as Large Language Models (LLMs), demands substantial computational resources and extended training periods. While optimal configuration and infrastructure selection can significantly reduce associated costs, this optimization requires preliminary analysis tools. This paper introduces PreNeT, a novel predictive framework designed to address this optimization challenge. PreNeT facilitates training optimization by integrating comprehensive computational metrics, including layer-specific parameters, arithmetic operations and memory utilization. A key feature of PreNeT is its capacity to accurately predict training duration on previously unexamined hardware infrastructures, including novel accelerator architectures. This framework employs a sophisticated approach to capture and analyze the distinct characteristics of various neural network layers, thereby enhancing existing prediction methodologies. Through proactive implementation of PreNeT, researchers and practitioners can determine optimal configurations, parameter settings, and hardware specifications to maximize cost-efficiency and minimize training duration. Experimental results demonstrate that PreNeT achieves up to 72% improvement in prediction accuracy compared to contemporary state-of-the-art frameworks.
Designing deep neural networks for driver intention recognition
Vellenga, Koen, Steinhauer, H. Joe, Karlsson, Alexander, Falkman, Göran, Rhodin, Asli, Koppisetty, Ashok
Driver intention recognition studies increasingly rely on deep neural networks. Deep neural networks have achieved top performance for many different tasks, but it is not a common practice to explicitly analyse the complexity and performance of the network's architecture. Therefore, this paper applies neural architecture search to investigate the effects of the deep neural network architecture on a real-world safety critical application with limited computational capabilities. We explore a pre-defined search space for three deep neural network layer types that are capable to handle sequential data (a long-short term memory, temporal convolution, and a time-series transformer layer), and the influence of different data fusion strategies on the driver intention recognition performance. A set of eight search strategies are evaluated for two driver intention recognition datasets. For the two datasets, we observed that there is no search strategy clearly sampling better deep neural network architectures. However, performing an architecture search does improve the model performance compared to the original manually designed networks. Furthermore, we observe no relation between increased model complexity and higher driver intention recognition performance. The result indicate that multiple architectures yield similar performance, regardless of the deep neural network layer type or fusion strategy.
- Europe > Sweden (0.04)
- Oceania > Australia (0.04)
- North America > United States > New York > New York County > New York City (0.04)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- Transportation (0.70)
- Automobiles & Trucks (0.68)
Accuracy is not the only Metric that matters: Estimating the Energy Consumption of Deep Learning Models
Getzner, Johannes, Charpentier, Bertrand, Günnemann, Stephan
Published as a workshop paper at "Tackling Climate Change with Machine Learning", ICLR 2023 Modern machine learning models have started to consume incredible amounts of energy, thus incurring large carbon footprints (Strubell et al., 2019). We accomplished this, by collecting high-quality energy data and building a first baseline model, capable of predicting the energy consumption of DL models by accumulating their estimated layer-wise energies. Deep CNNs, such as VGG16 or ResNet50 already deliver great performance (Simonyan & Zisserman, 2014; He et al., 2015). Yet the increasing number of layers in such models comes at the cost of severely increased computational complexity, resulting in the need for power-hungry hardware (Thompson et al., 2020; Jin et al., 2016). An example of a model that behaves extremely poorly in this regard is a big transformer with neural architecture search (Strubell et al., 2019). Clearly, training and running these models is not just a matter of financial cost, but also environmental impact.
An algorithmic framework for the optimization of deep neural networks architectures and hyperparameters
Keisler, Julie, Talbi, El-Ghazali, Claudel, Sandra, Cabriel, Gilles
While each new learning task requires the handcrafted design of a new DNN, automated deep learning facilitates the creation of powerful DNNs. Interests are to give access to deep learning to less experienced people, to reduce the tedious tasks of managing many parameters to reach the optimal DNN, and finally, to go beyond what humans can design by creating non-intuitive DNNs that can ultimately prove to be more efficient. Optimizing a DNN means automatically finding an optimal architecture for a given learning task: choosing the operations and the connections between those operations and the associated hyperparameters. The first task is also known as Neural Architecture Search [Elsken et al., 2019], also named NAS, and the second, as HyperParameters Optimization (HPO). Most works from the literature try to tackle only one of these two optimization problems. Many papers related to NAS [White et al., 2021, Loni et al., 2020b, Wang et al., 2019b, Sun et al., 2018b, Zhong, 2020] focus on designing optimal architectures for computer vision tasks with a lot of stacked convolution and pooling layers. Because each DNN training is time-consuming, researchers tried to reduce the search space by adding many constraints preventing from finding irrelevant architectures. It affects the flexibility of the designed search spaces and limits the hyperparameters optimization.
- North America > Trinidad and Tobago > Trinidad > Arima > Arima (0.04)
- Europe > France (0.04)
💣Notes to Self: Convolutional Neural Networks(CNNs or ConvNets)
First, the name convolutional comes from the mathematical operation convolution. Cross-correlation and convolution can be confused in machine learning but as known cross-correlation does not make flip the source image or kernel weights opposite of convolution. Convolutional neural networks are the most used type of neural network for computer vision applications. CNNs are a family of deep neural networks that uses mainly convolutions to achieve the task expected. One of the most famous article about CNNs(LeNet) by Yann LeCun is "Gradient-Based Learning Applied to Document Recognition."